Read and spontaneous speech classification based on variance of GMM supervectors
نویسندگان
چکیده
This paper provides a novel method to classify spoken utterances into reading style or spontaneous style. Read/spontaneous speech classification is important for extracting data to train acoustic models for speech recognition from real data in which read speech and spontaneous speech samples are mixed. We analyzed 23,900 reading and 31,988 spontaneous utterances of 30 speakers and found that variance of GMM supervectors in several consecutive utterances can discriminate the reading and spontaneous styles and has less speaker-dependency. Based on this knowledge, our method uses variance of GMM supervectors to classify unknown consecutive utterances into reading style or spontaneous style. Experiments show that our technique can classify 5 consecutive utterances of unknown speakers with over 95% accuracy without any other lexical, phonetic, or prosodic features.
منابع مشابه
Speaker Diarization Based on Gmm Supervectors and Unsupervised Intra-speaker Variability Modeling
This paper presents a novel framework for speaker diarization. Audio is parameterized by a sequence of GMM-supervectors representing overlapping short segments of speech. Session dependent intra-session intra-speaker variability is estimated online in an unsupervised manner, and is removed from the supervectors using Nuisance Attribute Projection (NAP) The supervectors are then projected using ...
متن کاملCombining five acoustic level modeling methods for automatic speaker age and gender recognition
This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maxi...
متن کاملAutomatic speaker age and gender recognition using acoustic and prosodic level information fusion
The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods t both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture odel (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM ean supervectors and (3) SVM...
متن کاملAn Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN
Snoring is caused by the narrowing of the upper airway and it is excited by different locations within the upper airways. This irregularity could lead to the presence of Obstructive Sleep Apnea Syndrome (OSAS). Diagnosis of OSAS could therefore be made by snoring sound analysis. This paper proposes the novel method to automatically classify snoring sounds by their excitation locations for ComPa...
متن کاملExploiting supervector structure for speaker recognition trained on a small development set
Nowadays state-of-the-art speaker recognition systems obtain quite satisfactory results for both text-independent and textdependent tasks as long as they are trained on a fair amount of development data from the target domain (assuming clean speech). In this work, we investigate the ability to build accurate speaker recognition systems using small amounts of data from the target domain without ...
متن کامل